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Abstract 

Spectral partitioning is a simple, nearly-linear time, algorithm to find sparse cuts, and the 
Cheeger inequalities provide a worst-case guarantee for the quality of the approximation found 
by the algorithm. Local graph partitioning algorithms [ST08, ACL06, AP09] run in time that 
is nearly linear in the size of the output set, and their approximation guarantee is worse than 
the guarantee provided by the Cheeger inequalities by a poly-logarithmic log^^^^ n factor. It 
has been an open problem to design a local graph clustering algorithm with an approximation 
guarantee close to the guarantee of the Cheeger inequalities and with a running time nearly 
linear in the size of the output. 

In this paper we solve this problem; we design an algorithm with the same guarantee (up to 
a constant factor) as the Cheeger inequality, that runs in time slightly super linear in the size of 
the output. This is the first sublinear (in the size of the input) time algorithm with almost the 
same guarantee as the Cheeger's inequality. As a byproduct of our results, we prove a bicriteria 
approximation algorithm for the expansion profile of any graph. Let (f'il) = iiiiii;^(s)<7 0('5')- 
There is a polynomial time algorithm that, for any 7, e > 0, finds a set S of volume IJ,{S) < 27^+', 
and expansion 4'{S) < ^20(7)/e. Our proof techniques also provide a simpler proof of the 
structural result of Arora, Barak, Steurer [ABSIO], that can be applied to irregular graphs. 

Our main technical tool is that for any set S of vertices of a graph, a lazy t-step random 
walk started from a randomly chosen vertex of S, will remain entirely inside S with probability 
at least (1 — ^(S')/2)*. This itself provides a new lower bound to the uniform mixing time of 
any finite states reversible markov chain. 



1 Introduction 

Let G = (y,E) be an undirected graph, with n := |y| vertices, and let d{v) denote the degree of 
vertex v £ V. The measure (volume) of a set S" C 1/ is defined as the sum of the degree of vertices 
in S, 

MS) :=Y,div). 

The conductance of a set S is defined as 

<P{S) := d{S)/fi{S) 
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where d{S) denotes the number of edges that leaves S. Let 



(P(G) := min (j)(S) 
s■.^i{s)<^,{v)/2 

be the conductance (uniform sparsest cut) of G. The Cheeger inequahties [AM85, AI086] prove 
that the spectral partitioning algorithms finds, in nearly linear time, a 0{1/ y^cjAG)) approximation 
to the uniform sparsest cut problem. Most notably, the approximation factor does not depend of 
the size of the graph; in particular the Cheeger inequalities imply a constant factor approximation, 
if (p{G) is constant. Variants of the spectral partitioning algorithm are widely used in practice 
[Kle99, SMOO, TM06]. 

Often, one is interested in applying a sparsest cut approximation algorithm iteratively, that 
is, first find an approximate sparsest cut in the graph, and then recurse on one or both of the 
subgraphs induced by the set found by the algorithm and by its complement. Such iteration might 
be used to find a balanced sparse cut if one exists (c.f. [0SV12]), or to find a good clustering of 
the graph, an approach that lead to approximate clusterings with good worst-case guarantees, as 
shown by Kannan, Vempala and Vetta [KVV04]. Even though each application of the spectral 
partitioning algorithm runs in nearly linear time, iterated applications of the algorithm can result 
in a quadratic running time. 

Spielman and Teng [ST04], and subsequently [ACL06, AP09] studied local graph partitioning 
algorithms that find a set S of approximately minimal conductance in time nearly linear in the size 
of the output set S. Note that the running time can be sub linear in the size of the input graph 
if the algorithm finds a small output set S. When iterated, such an algorithm finds a balanced 
sparse cut in nearly linear time in the size of the graph, and can be used to find a good clustering 
in nearly linear time as well. 

Another advantage of such "local" algorithms is that if there are both large and small sets 
of near-optimal conductance, the algorithm is more likely to find the smaller sets. Thus, such 
algorithms can be used to approximate the "small-set expander" problem, which is related to the 
unique games conjecture [RSIO] and the expansion profile of a graph (that is, what is the cut of 
smallest conductance among all sets of a given volume). Finding small, low-conductance, sets is also 
interesting in clustering applications. In a social network, for example, a low-conductance set of 
users in the "friendship" graph represents a "community" of users who are significantly more likely 
to be friends with other members of the community than with non-members, and discovering such 
communities has several applications. While large communities might correspond to large-scale, 
known, factors, such as the fact that American users are more likely to have other Americans as 
friends, or that people are more likely to have friends around their age, small communities contain 
more interesting information. 

A local graph clustering algorithm, is a local graph algorithm that finds a non-expanding set in 
the local neighborhood of a given vertex v, in time proportional to the size of the output set. The 
work/volume ratio of such an algorithm, which is the ratio of the the computational time of the 
algorithm in a single run, and the volume of the output set, may depend only poly logarithmically 
to the size of the graph. 

The problem first studied in the remarkable work of Spielman and Teng [ST04]. Spielman and 
Teng design an algorithm Nibble such that for any set ?7 C 1/, if the initial vertex, v, is sampled 
randomly according to the degree of vertices in U, with a constant probability. Nibble finds a set of 
conductance 0{(f)^/'^{U) log^/'^ n), with a work/volume ratio of 0{(f)~'^{U) polylog(n)), Nibble finds 
the desired set by looking at the threshold sets of the probability distribution of a t-step random walk 
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started at v. To achieve the desn-ed computational time they keep the support of the probabihty 
distribution small by removing a small portion of the probability mass at each step. 

Andersen, Chung and Lang [ACL06], used the approximate PageRank vector rather than ap- 
proximate random walk distribution, and they managed to improve the conductance of the output 
set to 0(y^ 4^{U) log n), and the work/volume ratio to 0((/>~"'^(C/) poly log n). More recently, An- 
dersen and Peres [AP09], use the evolving set process developed in the work of Diaconis and Fill 
[DF90], and they improved the work/volume ratio to 0((/)~^/^(C/) polylog n), while achieving the 
same guarantee as [ACL06] on the conductance of the output set. 

It has been a long standing open problem to design a local variant of the Cheeger's inequalities: 
that is to provide a sublinear time algorithm with an approximation guarantee that does not depend 
on the size of G, assuming that the size of the optimum set is sufficiently smaller than n, and a 
randomly chosen vertex of the optimum set is given. In this work we answer this question, and we 
prove the following theorem: 

Theorem 1.1. ParESP(u, 7, e) takes as input a starting vetex v G V , a target conductance 
(f) (z (0, 1), a target size 7, and < e < 1. For a given run of the algorithm it outputs a set S of 
vertices with the expected work per volume ratio of 0{'y''(p~^^'^ log^ n) . IfU'^Visa set of vertices 
that satisfy 4){U) < <j), and ^i{U) < 7, then there is a subset U' U with volume at least ^{U)/2, 
such that if V G U' , with a constant probability S satisfies, 

1. cPiS) = 0{y^e), 

2. n{S) < 0(7^+^). 

We remark that unlike the previous local graph clustering algorithms, the running time of the 
algorithm is slightly super linear in the size of the optimum. 

As a byproduct of the above result we give an approximation algorithm for the expansion profile 
of G. Lovasz and Kannan [LK99] defined the expansion profile of a graph G as follows: 

(Pil) ■= , min 0(5"). 

Lovasz and Kannan used expansion profile as a parameter to prove strong upper-bounds on 
the mixing time of random walks. The notion of expansion profile recently received significant 
attention in the literature because of its close connection to the small set expansion problem 
and the unique games conjecture [RSIO]. Raghavendra, Steurer, Tetali [RSTIO], and Bansal et 
al. [BFK"^11] use semidefinite programming and designed algorithms that approximate cpi'j) within 

0(y'^^(7)^^^"^°§"^^^)' ^1^*^ 0{^Jlognlog ^4^) of the optimum, respectively. However, in the inter- 
esting regime of 7 = o{fj,{V)), which is of interests to the small set expansion problem, the quality 
of both approximation algorithms is not independent of 7. 

Here, we prove 7 independent approximation of (/>(7) as a function of cj}{'y^~''), without any 
dependency in the size of the graph; specifically we prove the following theorem: 

Theorem 1.2. There is a polynomial time algorithm that takes as input a target conductance (j), 
and < e < 1/4, and outputs a set S, s.t. if (jiiU) < (j), for U '^V, then fJ-{S) < 2fi{Uy^'', and 
(t>{S) < y^e. 

Our theorem indicates that the hard instance of the small set expansion problem are those 
where 0(7) ~ 1, for 7 < n^~^^^\ 
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We remark that one can also use ParESP to approximate 0(7) with shghtly worse guarantees 
(up to constant factors), in sub hnear time. Here, for the sake of clarity and simplicity of the 
arguments, we prove the theorem using random walks. Independent of our work, Kwok and Lau 
[KL12] have obtained a somewhat different proof of Theorem 1.2. 

Our analysis techniques also provide a simpler proof of the structural result of Arora, Barak, 
Steurer [ABSIO], that can be applied to non-regular graphs. Let A be the adjacency matrix of 
G, and D be the diagonal matrix of vertex degrees. Let the threshold rank of G, denoted by 
ranki_.i^{D^^ A) , be the number (with multiplicities) of eigenvalues A of D~^A, satisfying A > l — rj. 

Theorem 1.3. For any graph G, and < e < 1, i/ ranki_^(D~"'^^) > n*-"'^'^'^^^/"^, then there exists 
a set S V of volume IJ,{S) < Aii{y)n~'^l'^ and 4>{S) < \j24>le. Such a set can he found by finding 
the smallest threshold set of conductance \/2(fJe, among the rows of {D'^^AY, for t = O(logn/0). 

We remark that Arora et al. [ABSIO] prove a variant of the above theorem for regular graphs 
with the stronger assumption that ranki_^(D~"'^A) > n^'^^^l't' , This essentially resolves their ques- 
tion of whether the factor 100 can be improved to 1 -|- e. Independent of our work, O'Donnell and 
Witmer [0W12] obtained a different proof of the above theorem. 

1.1 Techniques 

Our main technical result is that if is a set of vertices, and we consider a t-step lazy random walk 
started at a random element of S*, then the probability that the walk is entirely contained in 5 is 
at least (1 — </>(S')/2)*. Previously, only the lower bound 1 — t(j){S)/2 was known, and the analysis 
of other local clustering algorithms implicitly or explicitly depended on such a bound. 

For comparison, when t = 1/0(5'), the known bound would imply that the walk has probability 
at least 1/2 of being entirely contained in S, with no guarantee being available in the case t = 
2/(p{S), while our bound implies that for t = {alnn)/(j) the probability of being entirely contained 
in S is still at least 1/n". Roughly speaking, the r2(logn) factor that we gain in the length of 
walks that we can study corresponds to our improvement in the expansion bound, while the 1/n" 
factor that we lose in the probability corresponds to the factor that we lose in the size of the non- 
expanding set. We also use this bound to prove stronger lower bounds on the uniform mixing time 
of reversible markov chains. 

Our polynomial time algorithm to approximate the expansion profile of a graph is the same 
as the algorithm used by Arora, Barak and Steurer [ABSIO] to find small non-expanding sets in 
graphs of a given threshold rank, but our analysis is different. (Our analysis can be also be used 
to give a different proof of their result.) Arora, Barak and Steurer use the threshold rank to argue 
that a random walk started from a random vertex of G will be at the initial vertex after t steps 
with probability at least ranki_^(G)(l — rjY/n. Then, they argue that if all sets of a certain size 
have large conductance, this probability must be small, which is a contradiction. To make this 
quantitative they use the second norm of the probability distribution vector (||p4||) as a potential 
function, where Pf is the distribution of the walk after t steps, and they choose t to get a sufficiently 
small potential function and argue that the probability of being at the initial vertex after t steps 
must be small. In our analysis, we use the fact that for any set 5, there is a vertex v G S such 
that the probability that the walk started at v remains in S is at least (1 — (/)(<S')/2)*. Then, we use 
the potential function /(pj,7) introduced in the work of Lovasz and Simonovits [LS90]. Roughly 
speaking, /(p^, 7) is defined as follows: consider the distribution p^ of the vertex reached in a t-step 
random walk started from a random element of 5, take the k vertices of highest probability under 
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Pt, where k is chosen so that their total volume is about 7, then J(p(,7) is the total probability 
under of those k vertices. 

Using the machinery of Lovasz and Simonovits, we can upper-bound I{pt, 7) by ^+^/7(l— ^^/2)* 
conditioned on all of the threshold sets of volume at most F of the probability distribution vectors 
up to time t having conductance less than (p. Letting t = Q{a log iJ,{S)/(l){S)), T = 0{fi{Sy~^°'), 
and (/) = O^yQy^Sj/a) since the walk remains in S with probability /i(S')^" < I{pt,j), at least one 
of the threshold sets of volume at most 0(/x(5)^"'""), must have conductance 0{^/(f{S)/a). 

Our local algorithm uses the evolving set process. The evolving set process starts with a vertex 
V of the graph, and then produces a sequence of sets Si, S2, ■ ■ ■ , S-r, with the property that at least 
one set St is such that dSt/fi{St) < 0{^yiog /i(5'r) /t). If one can show that up to some time T 
the process constructs sets all of volume at most 7, then we get a set of volume at most 7 and 
conductance at most 0(Y^log7/T). Andersen and Peres were able to show that if the graph has a 
set S of conductance (j), then the process is likely to construct sets all of volume at most 2/^(5) for 
at least T = 17(1/0) steps, if started from a random clement of the S, leading to their 0{\/(j) logn) 
guarantee. We show that for any chosen a < 1/2, the process will construct sets of volume at 
most 0{iJ,{S)^~^°') for T = 0(alog/u(5)/(/)) steps, with probability at least l///(5)". This is enough 
to guarantee that, at least with probability l/ju(S')", the process constructs at least one set of 
conductance 0{y^ <p/a). To obtain this conclusion, we also need to strengthen the first part of the 
analysis of Andersen and Peres: they show that the process has at least a constant probability of 
constructing a set of low conductance in the first t steps, while we need to show that this happens 
with probability at least 1 — l//x(iS')^f^\ because we need to take a union bound with the event 
that t is large, for which probability we only have a /i(S')~^'-^'' lower bound. Finally, to achieve a 
constant probability of success, we run copies of the evolving set process simultaneously, and 

stop as soon as one of the copies finds a small non-expanding set. 

2 Preliminaries 

2.1 Notations 

Let G = {y,E) be an undirected graph, with n := \V\ vertices and m := \E\ edges. Let A be 
the adjacency matrix of G, D be the diagonal matrix of vertex degrees, and d{v) be the degree of 
vertex v eV. The volume of a subset S C y is defined as the summation of the degree of vertices 
in S, 

■.= Y^d{v). 

ves 

Let E{S, \ 5") := {{u, v} : u & S,v ^ S} he the set of the edges connecting S to V \ S, and we 
use d{S) to denote the number of those edges, we also let E{S) := {{m, : u,v e S} he the set of 
edges inside S. The conductance of a set 5 C y is defined to be 

</)(5) :=d{S)/f^iS). 

Observe that (t){V) = 0. In the literature, the conductance of a set is sometimes defined to be 
d(S)/ mm(iJ,(S) , iJ,(V \ S)). Notice that the quantities are within a constant factor of each other if 
)u(<S') = 0{iJ,{V \ S)). Since, here we are interested in finding small non-expanding sets, we would 
rather work with the above definition. 
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We define the following probability distribution vector on a set S C F of vertices: 

lO otherwise. 

In particular, we use tt{v) = itv{v) as the stationary distribution of a random walk in G. 

Throughout the paper, let I be the identity matrix, and for any subset S C V, let Is be the 
diagonal matrix such that Is{v,v) = 1, ii v e S and otherwise. Also, let 1 be all one vector, and 
Is be the indicator vector of the set S. We may abuse the notation and use 1^ instead of !{„}. for 
a vertex v € V. 

We use lower bold letters to denote the vectors, and capital letters for the matrices/sets. For 
a vector x : F — )■ i?, and a set 5 C V, we use x{S) := X^^gg x{v). Unless otherwise specified, x is 
considered to be a column vector, and x' is its transpose. 

For a square matrix A, we use Ainin(^) to denote the minimum eigenvalue of A, and Ainax(^) 
to denote the maximum eigenvalue of A. 

2.2 Random Walks 

We will consider the lazy random walk on G that each time step stays at the current vertex with 
probability 1/2 and otherwise moves to the endpoint of a random edge attached to the current 
vertex. We abuse notation, and we use Q := [D~^A + 1)/2 as the transition probability matrix of 
this random walk, and 7r(.) is the unique stationary distribution, that is tv'G = n'. We write Vv [■] 
to denote the probability measure of the lazy random walk started from a vertex v eV. 

Let Xt be the random variable indicating the step of the random walk started at v. Observe 
that the distribution of Xt is exactly, I'^G^ ■ For a subset S V, and v £ V, and integer t > 0, 
we write esc(i', t, S) := Vv [u^^Q^j ^ S~\ to denote the probability that the random walk started at 
V leaves S in the first t steps, and iem.{v,t, S) := 1 — esc{v, t,S) as the probability that the walk 
stays entirely inside S. It follows that, 

Tem{v,t,S):=l'MsGIsYls. (1) 

2.3 Spectral Properties of the Transition Probability Matrix 

Although the transition probability matrix, Q, is not a symmetric matrix, it features many proper- 
ties of the symmetric matrices. First of all, Q can be transformed to a symmetric matrix simply by 
considering D^^^QD~^/^ . It follows that any eigenvector of D^^^QD~^/'^ can be transformed into 
a left (right) eigenvector of Q, once it is multiplied by D^/"^ respectively. Henceforth, the 

left and right eigenvalues of Q are the same, and they are real. 

Furthermore, since [[/^"^^Hoo < 1, and Q is the average of D~^A, and the identity matrix, 
we must have \mhi{G) > and Amax(^) < 1- Thus D^/'^QD~^/'^ is a positive semidefinite matrix, 
symmetric matrix, whose largest eigenvalue is at most 1. 

2.4 The Evolving Set Process 

The evolving set process is a markov chain on the subsets of the vertex set V. The process together 
with the closely related volume biased evolving set process is introduced in the work of Diaconis 
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and Fill [DF90] as the strong stationary dual of a random walk. Morris and Peres [MP03] use it to 
upper-bound the mixing time of random walks in terms of isoperimetric properties. 

Given a subset Sq C V, the next subset 5*1 is chosen as follows: first we choose a threshold 
R € [0, 1] uniformly at random. Then, we let 

5i := {u : Vu [Xi G So] > R}. 

The transition kernel of the evolving set process is defined as, K(5, S') = P [5i = 5'|5o = S]. It 
follows that 0, and V are the absorbing states of this markov chain, and the rest of the states are 
transient. Morris and Peres [MP03] defined the growth gauge ip{S) of a set S as follows: 



■0(5) := 1 - E 



So — S 



They showed that 

Proposition 2.1 (Morris,Peres [MP03]). For any set S ^V, ^'(5') > (piSf/S. 

The volume-biased evolving set process is a special case of the evolving set process where the 
markov chain is conditioned to be absorbed in V. In particular, the transition kernel is defined of 
a volume biased ESP as follows: 



K{S, S') 



K(5, 5'). 



Given a state So, we write Pso [■] '■= P [•|5'o] to denote the probability measure of the volume biased 
ESP started at state So, and we use E^^ [.] := E [.|5o] for the expectation. 

Andersen and Peres used the volume biased ESP as a local graph clustering algorithm [AP09]. 
They show that for any non-expanding set U, if we run the volume biased ESP from a randomly 
chosen vertex of U, with a constant probability, there is a set in the sample path of expansion 
0(y^ (f>{U) log n), and volume at most 2fi{U). As a part of their proof, they designed an efficient 
simulation of the volume biased ESP, called GenerateSample. They prove the following theorem. 

Theorem 2.2 (Andersen, Peres [AP09, Theorems 3,4]). There is an algorithm, GenerateSample, 
that simulates the volume biased ESP such that for any vertex v £ V , any sample path {Sq = 
{v}, . . . , Sr), is generated with probability P^, [^o, . . . , Sr]- Furthermore, for a stopping time r that 
is bounded above by T, let W{t) be the time complexity of GenerateSample if it is run up to time 
T. Then, the expected work per volume ratio of the algorithm is 



E,) 



W{t) 

fl{Sr) 



0{T^/^ log3/V(^)). 



3 Upper Bounds on the Escaping Probability of Random Walks 

In this section we establish strong results on the escaping probability of the random walks. Spielman 
and Teng [ST08] show that for any set S V, t > 0, the random walk started at a randomly 
(proportional to degree) chosen vertex of S, remain in S for t steps with probability at least 
1 — t(j)(S)/2. We strengthen this result, by improving the lower bound to (1 — 0(S')/2)*, 
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Proposition 3.1. For any set S QV , and integer t > 0, 

E,^^, [lemiv, t, S)] >(l- ^) E,.„, [rem(^;, t - 1, S)] >...>( 1 - ^) ' . (2) 



2 J " ' ~ '-yj - - 2 J 

Furthermore, there is a subset 5* C 5", such that /u(5'*) > fi[S)/2, and for all v £ 

rem(t;,f,5) > ( 1 —\ . (3) 

We remark that the second statement does not follow from a simple application of the Markov 
Inequality to the first statement, as this is the case in [ST08]. Whence, here both of the results 
incorporate non-trivial spectral arguments. 

As a corollary, we prove strong lower bounds on the uniform mixing time of random walks in 
section 6. In the rest of this section we prove Proposition 3.1. We start by proving (2). 

Using equation (1), and a simple induction on t, (2) is equivalent to the following equation: 

ir'silsgisf^s > (1 - ^{S)/2)7z'{IsQIsY-^^s. (4) 
Let P := D^/^IsGIsD^^^'^ . First we show that (4) is equivalent to the following equation: 

V^'p'V^ > iv^'PV^) {^'p'-'^) . (5) 

Then, we use Lemma 3.2 that shows the above equation holds for any symmetric positive semidef- 
inite matrix P, and any norm one vector x = ^/Ws■ First observe that by the definition of P, for 
any t > 0, 

Tv'silsGIsYls = tt'sD-^'^P'D^/^s = V^'P'V^ (6) 



On the other hand. 



Tv'silsGIsns = ^7r's{D-'A + I)ls 
^ -2\EiS)\ + '^ 



2^{S) ' ' " 2 
= l-^{S)/2. (7) 

Equation (5) is derived simply from equation (4), by putting (6), (7) together. Next we prove 
equation (5) using Lemma 3.2. First observe that ^/tvs is a norm one vector. On the other hand, 
by definition P = \{D-^I^IsAIsD-^l^ + D-^I^IsD-^'^) is a symmetric matrix. 

It remains to show that P is positive semidefinite. This follows from the same reason that 
D^I'^QD~^I'^ is positive semidefinite. In particular, since eigenvectors of P can be transformed to the 
eigenvectors of IsGIs, the eigenvalues of IsQIs are the same as the eigenvalues of P. Finally, since 
1 1 151)^^^/5 1 loo < 1, and IsQIs is the average of IsD~^AIs and Is, we must have Xmin{IsQIs) ^ 0) 
and \max[IsQIs) < 1- Thus P is positive semidefinite. Now, (5) simply follows from Lemma 3.2. 
This completes the proof of (2) 
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It remains to prove (3). We prove it by showing that for any set X C S, of volume > 
/i(5)/2, the random walk started at a randomly (proportional to degree) chosen vertex of X, 
remains in X (and 5*), with probability at least 2no(^ ~ 30(«S')/2)*, 

E,.„, [rem(t;,t,X)] = ^'^{IsGhY^x > ^ (l " (8) 

Therefore, in any such set X, there is a vertex that satisfy (3), hence the volume of the set of 
vertices that satisfy (3) is at least half of n{S). 

Using equations (6) and (7), (8) is equivalent to the following equation, 

V^'P'V^ > ^ {i^s'PV^ - 2)* . (9) 
We prove the above equation using Lemma 3.3. Let Y = S\X^ and define 



X := IxV^ = VKX)7Vx/KS) (10) 
y := IyV^ = \/ KY)i^Y / KS) 

Since X n y = 0, (x, y) = 0, and ||x + y|| = Hy^Tf^H = 1. Furthermore, since ^{X) > ij,{S)/2 > 
/i(y), ||x|| > ||y||. Therefore, P, x, y satisfy the requirements of Lemma 3.3. Finally, since 
y/7Vx' P^y/T^x ^ x'P*x, (8) follows from Lemma 3.3. This completes the proof of Proposition 
3.1. 

Lemma 3.2. Let P G M"^" he a symmetric positive semidefinite matrix. Then, for any x G of 
norm \\x\\ = 1, and integer t > 0, 

x'P*x > (x'P*-^x) (x'Px) > . . . > (x'Px)* . 

Proof. Since all of the inequalities in lemma's statement follows from the first inequality, we only 
prove the first inequality. Let vi , V2 , . . . , v„ be the set of orthonormal eigenvectors of P with the 
corresponding eigenvalues Ai, A2, . . . , A„. For any A; > 1, we have 

x'P'x = |^fj(x,v,)v,^P'=|^fj(x,v,)v,^ 
= (^XJ^x,v,)Afvi^ • ^^(x,Vi)vij 

n 

= 5;(x,v,)2Af. (11) 

1=1 

On the other hand, since {vi, . . . , v„} is an orthornormal system, we have 

n 

5:(x,v.)2 = ||xf = L 
1=1 

For any k > 0, Let /a,. (A) = A*^; it follows that, 

^(x,v,)2Af = Ea^^ [A(A)], 
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where F\^x> [A = Aj] = (x, Vj)^. Using equation (11) we may rewrite the lemma's statement as 
follows: 

[/t_i(A)/i(A)] > [ft-m] [/i(A)] 

Since P is positive semidefinite, Amin(P) > 0. Thus, for all t > 0, the function is increasing in 
the support of T>. The above inequality follows from the Chebyshev's sum inequality. □ 

Lemma 3.3. Let P G M"^" be a symmetric positive semidefinite matrix such that Xmax{P) ^ 1; 
andx, y G R" such that (x, y) = 0, ||x + y|| = 1, and ||x|| > ||y||. Then, for any integer t > 0, 

x'P*x>^(3(x + yyP(x + y)-2)* 

Proof. Let z := x + y. Since x is orthogonal to y, we have ||y|p < 1/2 < ||x|p. Let vi, V2, . . . , v„ 
be the set of orthonormal eigenvectors of P with the corresponding eigenvalues Ai, A2, • • • , A„. Let 
a > be a constant that will be fixed later in the proof. Define B := {i : |(x,Vj)| > Q|(y,Vj)|}. 
First observe that, 



n 

x'P*x = ^{^,^r,fXl > j;(x, v,)2a* > ^(z, v,)2a* 

i=l i&B ^ ' ' i£B 



A*, (12) 



where the equality follows from equation (11), the first inequality uses Ainin(-P) > 0, and the last 
inequality follows from the definition of B, that is for any i ^ B, (x, Vj)^ > ((z,Vj)/(l + 1/a))^. 
Let L := X]jgB(^' ^«)^- Then, since Amin(-P) > 0, by Jensen's inequality, 



ieB \ ieB 

Er=i(z,v,)2A,-(l-L)^* 



L 

where the second inequality follows by the assumptions that Amax(-P) ^ li and that ||z|| = 1, and 
the last inequality follows from the fact that z'Pz < 1, and that 

L = Y,{z, v.)2 = 1 - Y.{z, v.)2 > 1 - (1 + a)^||yf > 

ieB i^B 

Putting equations (12) and (13), and letting a = 0.154 we get, 

, , l-a2-2a/ 1-z'Pz \* 1 / , 

x'P*x > — 1 - ^ -— > (3z'Pz - 2 ) 

-2(l + l/a)2V (l-Q2-2a)/2y " 200 ^ 

□ 
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4 Approximating the Expansion Profile 



In this section we use the machinery developed in the works of Lovasz and Simonovits [LS90, LS93] 
to prove Theorem 1.2, Theorem 1.3. We start by introducing some notations. 

Let p be a probabihty distribution vector on the vertices of V, and let cj(.) be the permutation 
of the vertices that is decreasing with respect to p{v)/d{v), and breaking ties lexicographically. 
That is, suppose 

d{a{l)) - d{a{i)) - ' ' ' - d{a{n)) ' 

We use Ti{p) := {a{l), . . . ,a{i)} to denote the threshold set of the first i vertices. Following 
Spielman, Teng [ST08] (c.f. Lovasz, Simonovits [LS90]), we use the following potential function, 

I(p,x) := max > w(v)p(v). (14) 



Observe that for /(p, x) is a non-decreasing piecewise linear concave function of x, that is /(p, x) = 
p{Tj{p)), for X = fi(Tj{p)), and is linear in other values of x. We use /(p, x) as a potential function 
to measure the distance of the distribution p from the stationary distribution tt. 

We find the small non-expanding set in Theorem 1.2, by running Threshold(y^20/e, e In fi{V)/(t)). 
The algorithm simply returns the smallest non-expanding set among the threshold sets of the rows 
of Q*, for t = 0{€log n{V)/(j)). The details are described in Algorithm 1. 

Algorithm 1 Threshold((/), e) 

Let Tbe the family of all threshold sets Ti{lyQ^), for any vertex v £ V, and 1 < t < elog iJ,{V)/(l)/, 

with conductance at most 's/2(p/e. 

return the set with minimum volume in T- 

If none of the sets Ti{l'^Q*) is a non-expanding set, then Lovasz and Siminovits [LS90, LS93] 
prove that the curve 1(1^^*, x) lies far below This is quantified in the following 

lemma: 

Lemma 4.1 (Lovasz, Simonovits [LS93, Lemma 1.3]). Let Q be a transition probability matrix of 
a lazy random walk on a graph. For any probability distribution vector p on V, if (j){Ti{p'Q)) > <I>, 
then for x = fj,{Ti{p'G)), 

I(p'Q, x) < — (/(p, X — 2$ min(x, 2m — x) + I{p, x + 2$ min(x, 2m — x)) . 

By repeated application of the above lemma, Lovasz and Simonovits [LS90] argue that, if all of 
the sets Ti(l'^Q^) are expanding, then I{1'^Q^, .) approaches the straight line. In the next lemma we 
show that, if all of the small threshold sets (i.e., ^(Ti{l'^G^)) < P), are expanding, then /(l^^*,.) 
approaches the curve x/T. 

Lemma 4.2. For any vertex v £ V , t > 0, and < T < m, < ^ < 1/2, if for all t < T, all of 
threshold sets Tj(l^^*) of volume at most T, has expansion at least then for any < t <T, 

2\ * 



/(i;a*,x)<^ + y^7M^(i-^) 
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Proof. We prove by induction. The lemma trivially holds for t = 0. This is because the LHS is 
xj ii{v) for < X < ijl{v) and 1 for larger values of x, while the RHS is ^Jx/ iJ,{v) for all x > 0. Next, 
we prove the lemma's statement holds for t, assuming that it holds for t — 1. Let p := l'^Q^~^. 
First of all, since I{p'Q, .) is a piecewise-linear concave function of x, it is sufficient to prove the 
statement for values of x = fi{Ti(p'Q)). For x > F, the statement holds trivially, because the RHS 
is at least 1, while the LHS is less than or equal to 1. Now, suppose x < F, and x = fj.{Ti{p'G)). 
Using Lemma 4.1, we have 



where the first inequality uses the assumption that x < F < m, the second inequality uses the 
induction hypothesis, and the last inequality uses the inequality 



Now we are ready to prove Theorem 1.2: using Proposition 3.1 we show that /(l^^*) does not 
converge to x/F, for t ~ log'y/(j). Therefore, by the previous lemma, at least one of the small 
threshold sets is non-expanding. 

Theorem 1.2. There is a polynomial time algorithm that takes as input a target conductance (j), 
and < e < 1/4, and outputs a set S, s.t. if (f){U) < cj), for U CV, then /(/(S*) < 2/i(J7)^"'"'^, and 



0(5) < y^e. 

Proof Let, 7 = /i(C/), T = eln7/(/>, F = 27^+% and ^> = ^/2(f)/e. We show that Threshold((/>, e) 



returns a set of volume at most F, and conductance at most Wlog we may assume that F < m, 
otherwise the statement is trivial. We prove by contradiction; assume that the output of the 
algorithm has volume larger than F, we show that Lemma 4.2 and Proposition 3.1 can not hold 
simultaneously. First of all, by Proposition 3.1, there exists a vertex u £ U, such that 



/(p'g,x) < 






holds for any $ < 1/2. 



□ 




Let p = I'uQ'^ ■ Let w{v) = 1, for all v £ U, and w{v) = for the rest of the vertices. By equation 
(14), we have 




On the other hand, by Lemma 4.2, we have 




which is a contradiction. The last inequality uses the fact that e < 1/2. 



□ 
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Next we show that using Lemma 4.2 we can provide a simpler, and yet stronger proof of the 
result of Arora, Barak and Steurer [ABSIO]. 

Theorem 1.3. For any graph G, and < e < 1, i/ ranki_,j(D~^j4) > n^^"'"^)^/'^, then there exists 
a set S CV of volume fi{S) < 4:fi{V)n~^/'^ and (j){S) < ^Jl^je. Such a set can be found by finding 
the smallest threshold set of conductance Y^20/e, among the rows of {D~^AY, for t = O(logn/0). 

Proof. Wlog we assume (/) < 1/2, r/ < and n'^/^ > 4. Let T = elnn/(/., T = 4^(y)n-''/<^, and 
$ = ■sj2^le. We show that Threshold(<I>, T) finds a set of volume F and conductance at most <I>. We 
prove by contradiction; suppose that Threshold does not find such a set. Since Q = ^{D~^A + /), 
ranki_^/2(^) ^ n^^"*''^^^/'^. Therefore, by the next claim, there is a vertex u such that. 

Let p = \'^Q'^ , X = fJ-{u), w(v) = 1, for v = u and w{v) = for the rest of the vertices. By equation 
(14), we have 

-^(p,a;) > tt;(f ) = p{u) > max 
But, by Lemma 4.2, we have 

/,P..)4 + v^(i-f)%^„v* + i, 

which is a contradiction, since n'^/'t' > 4. 

Claim 4.3. For any graph G, i/ ranki_^^(^) > r, then there is a vertex u £V, such that 

i;,e'i.>m„{i-,^}...(i-„)'. 

Proof. Let < Ai, . . . , A„ < 1 be the eigenvalues of Q. We use the trace formula, 

J2 ^vG'lv = TV(g*) = ^ A* > r • (1 - r?)*. 

Now, let Ui := {v : l^G^l^ < r(l - ??)V2n}, and U2 := {v : vl^g*l„ < j^r{l - r?)*}. It follows 
that. 

Therefore, there is a vertex u ^ Ui,U2 that satisfies claim's statement. □ 

□ 



l2n' 2;u(y) J 
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5 Almost Optimal Local Graph Clustering 



In this section we use the volume biased ESP to design a local graph clustering algorithm with a 
worst case guarantee on the conductance of output set that is independent of the size of G. 

Let (S'o, 5*1, . . . , Sr) be a sample path of the volume biased ESP, for a stopping time r. Andersen 
and Peres show that with a constant probability the conductance of at least one of the sets in the 

sample path is at most ^ log /x(5t-)), 

Lemma 5.1 ([AP09, Lemmal,Corollar 1]). For any starting set Sq, and any stopping time t, and 
a > 0, 

"' 1 



So 



^<AV0<4aln^ 



> 1 



a 



Here, we strengthen the above result, and we show the event occurs with much higher proba- 
bility. In particular, we show the with probability at least 1 — 1/a, the conductance of at least one 



of the sets in the sample path is at most 0(y ^ log(a • /u(S't-))). 

Lemma 5.2. For any starting set Sq C V , and any stopping time r, and a >0, 



'So 



|:,= (S.,<8(.n„ + lnM|)) 



> 1 

a 



Proof. Let 



Mt 



t-i ^ 

n 1 _ ^ 



Andersen, Peres [AP09, Lemma 1] show that Mt is a martingale in the volume biased ESP. It 
follows from the optional sampling theorem [Wil91] that E [Mr] = Mq = 1. Thus, by the Markov 
inequality, for any a > 0, we have 

P [Mr < a] > 1 ^ 



a 

By taking logarithm, from both sides of the event in the above equation we obtain, 

P [InM^ < Ina] > 1 - - 

a 

On the other hand, by the definition of Mr, 

T-l 



(15) 



InM. = iln^^^° 



fi{Sr) 



- 2 fsiSr) 



i=0 

T-l 

i=0 



(16) 



i=0 



where the first inequality follows by the fact that 1/(1 — V'(5'j)) > e'^'^'^'^, and the last inequality 
follows by Proposition 2.1. Putting (15) and (16) together proves the lemma. □ 
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The previous lemma shows that for any 7, (/> > 0, if we can run the process for T ~ e\og^/(j) 
steps without observing a set larger than 7*^^^^ then, with probability 1 — I/7, one of the sets in 
the sample path must have an expansion of 0{^/^)/e), which is what we are looking for. Next we 
use the following lemma by Andersen and Peres, together with Proposition 3.1, to show that event 
occurs with some non-zero probability. That is, for any e < 1, with probability at least w 7"^, 
the volume of all sets in the sample path of the process are at most 0(7^^*^). Then, by the union 
bound we can argue both event occur with probability at least il(7~'^). 

Lemma 5.3 (Andersen, Peres [AP09, Lemma 2]). For any set U C V, v £ U, and integer T > 0, 
the following holds, 



t<T fl{St) 



Lemma 5.4. Given 7 > 0, < < 1/4, < e < 1, such that G has a set U C V of volume 
jJiiU) < 7, and conductance (piU) < (p. Let T = eln7/30. There is a constant c > 0, and a subset 
U'^ ^ U of volume fJ-iU^) > ii{U)/2 such that for any v G U'^ , with probability at least c7^'^/8, a 
sample path {Si, S2, ■ ■ ■ , St) of the volume biased ESP started from Sq = {v} satisfies the following, 



i) For some t G [0,r], cpiSt) < O(v^l00(l -lnc)(/)/e), 

a) For all t G [0,r], /x(5t nU)> cj~'fx{St)/2, and henceforth, fi{St) < 27^+70. 
Proof. First of all, we let ?7"^ be the set of vertices v £ U such that 



rem(f , T,U) > c 



T 



By Proposition 3.1, there exists a constant c > such that /x(C/-^) > /x(C/)/2. In the rest of the 
proof let f be a vertex in . We have. 



esc{v, T, [/) < 1 - c 1 



mm. 



< 1 - c 1 



e In 7 

3(t)\ 



< 1 — C7 



Now, let /3 := 1 + C7 ''/2. By Lemma 5.3, we have 

KSt \ U) 



max ■ 

t<T 



< P esc{v, T,U) < 1 



C7 



^{St) 2 

Since for any S CV, /x(S' \U) + fj,{S CiU) = n{S), we have 



> 1 



i > £2_ 

/? - 4 



. n{St nu) ^ C7- 

mm — ^ — > 

t<T n{St) - 2 



> 



C7 



On the other hand, let a := 7. By Lemma 5.2, with probability 1 — I/7, for some t G [0, T] 

T 



t=0 
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Therefore, since e < 1, by the union bound we have 



P. 



. KSt nU) ^ . /8(ln7 + lnMST)) 



> £21 



Finally, since for any set S V, iJ,{S Ci U) < fJ,{U) < 7, in the above event, ^(S't) < 
Therefore, 

^^^^^ < ^/ 8(ln7+ln(27i+Vc)) ^ ^/ 100(1 - In c),^ ' 

which completes the proof. □ 

To prove Theorem 1.1, we can simply run 7"^ copies of the volume biased ESP in parallel. Using 
the previous lemma with a constant probability at least one of the copies finds a non-expanding 
set. Moreover, we may bound the time complexity of the algorithm using Theorem 2.2. The details 
of the algorithm is described in Algorithm 2. 

Algorithm 2 ParESP(u, 7, (j), e) 
1: Let So ^ {v}, and T elwy/G^), and c as defined in Lemma 5.4. 

2: Run 7"^/^ independent copies of the volume biased ESP, using the simulator GenerateSample, 
starting from Sq, in parallel. Stop each copy as soon as the length of its sample path reaches 
T. 

3: If any of the copies finds a set S, of volume /i(5) < 27^+'^/^/c, and conductance (/>(5) < 



Y^200(l — lnc)(p/e), stop the algorithm and return S. 



Now we are ready to prove Theorem 1.1. 

Theorem 1.1. ParESP(v, 7, e) takes as input a starting vetex v G V , a target conductance 
(p £ (0, 1), a target size 7, and < e < 1. For a given run of the algorithm it outputs a set S of 
vertices with the expected work per volume ratio of 0{'j''(f>~^^'^log'^ n). IfU'^Visa set of vertices 
that satisfy (piU) < (j), and fJ.(U) < 7, then there is a subset U' with volume at least fj,{U)/2, 
such that if V G U' , with a constant probability S satisfies, 

1. cPiS) = 0{y^e), 

2. n{S) < 0(7^+^). 

Proof. Let U' = as defined in Lemma 5.4. First of all, for any v E U' , by Lemma 5.4, each 
copy of volume biased ESP, with probability 0(7"*^/^), finds a set S such that //(S*) < 27*^/^/0, and 
(j){S) < y^200(l — lnc)(/>/e; but, since 7^/^ copies are executed independently, at least one of them 
will succeed with a constant probability. Therefore, with a constant probability the output set will 
satisfy properties (1), (2) in theorems' statement. This proves the correctness of the algorithm. 

It remains to compute the time complexity. Let, k := 7*^/^ be the number of copies, and 
Wi , . . . , Wk be random variables indicating the work done by each of the copies in a single run 
of ParESP, thus ^ • Wi is the time complexity of the algorithm. Let M be the random variable 
indicating the volume of the output set of the algorithm; we let M = if the algorithm does not 
return any set. Also, for 1 < z < A;, let Xi be l/M if the output set is chosen from the z*^ copy. 
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and otherwise, and let X := We write [.] to denote the probabihty measure of the k 

independent volume biased ESP all started from So = {v}, and EJ; [.] for the expectation. To prove 
the theorem it is sufficient to show 



E," 



i=l 



0(7'(A"^/2 log^n). 



By linearity of expectation, it is sufficient to show that for all 1 < i < k, 

k 



x,J2w, 



By symmetry of the copies, it is sufficient to show that only for i = 1. Furthermore, since condi- 
tioned on Xi 7^ 0, Wi = maxj Wj, we just need to show, 

E^ [XiW^i] = 0(0-1/2 log2^), 

Let T be a stopping time, bounded from above by T, indicating the first time where a set 5,- of 
volume fJ-{Sr) < 2j^~^'^/c, and conductance (/'(S't) < y^200(l — logc)(/>/e is observed in the first copy 
if it is executed up to time r, and Wi{t) as the amount of work done by that time. Observe that 
for any element of the joint probability space, XiVFi < Wi{t)/ fi{Sr)- This is because, we always 
have Wi < Wit, and Xi < fj.{Sr)- Therefore, 



K [XiWi] < E^ 





— Et, 


"Ty(r)" 


_/x(5^) 




H{Sr)_ 



0{T^/^ log3/2 ^ 0{(p-^/'^ log^ n). 



where the second to last equation follows from Theorem 2.2. 



□ 



6 Lower Bounds on Uniform Mixing Time of Random Walks 

In this section we prove lower bounds on the mixing time of reversible markov chains. Since any 
reversible finite state markov chain can be realized as a random walk on a weighted undirected 
graph, for simplicity of notations, we model the markov chain as a random walk on a weighted 
graph G. 

The e-mixing time of a random walk in total variation distance is defined as 
ry(e) :=mmh:Y^ \Vu [Xt = v]- ■k{v)\ < e, Vii G ^ 

The mixing time of the chain is usually defined as ry(l/4). The e-uniform mixing time of the chain 
is defined as 

<e,Vu,t;Gl/|. (17) 

It is worth noting that the uniform mixing time can be considerably larger than the mixing time 
in total variation distance. 



r(e) := min < t : 



Vu [Xt = 
7r(v) 
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Let (j){G) := T^^^S:fi{S)<fi{V) /2 Jerrum and Sinclair [JS89] prove that the e-uniform mixing 

time of any lazy random walk is bounded from above by, 

2/1 1 

On the other hand, one can use (piG) as the bottleneck ratio to provide lower-bound on the mixing 
time of the random walks. It follows from the Cheeger's inequality that (see e.g. [LPW06]), 

rK(l/4) > ^ 



40(G)- 

In the next proposition we prove stronger lower bounds on the uniform mixing time of any 
reversible markov chain. 

Proposition 6.1. For any graph G = {V, E), 1 > 7 < ijl{V)/2, and < e < 1, 

ni) 

Proof. Let 5 C y such that /x(5) < 7, and 0(5) = ^(7). Let t > - ln(27r(S'))/20(5) - 2 be an even 
integer. By the next claim, and equation (1), there exists a vertex u €z S such that 

rem(u, t, 5) > (1 - 0(5))* > 27r(5). 

Since Vu [Xt € 5] > rem(ti, t, 5), there is a vertex v £ S such that, 

Vu [Xt = v] ^ Vu [Xt £ 5] ^ 27r(5) _ ^ 

where the first inequality uses Vu [Xt G 5] = Ylves'^u [Xt = v]. Therefore, "^^""^ > 1, and 

by equation (17), for any e < 1, r(e) > t. The proposition follows from the choice of 5; that is 

ln(M(F)/27) _ 2 

Hi) 

Claim 6.2. For any (weighted) graph G, S V , and integer t > 0, 

7v'siIsD-^AIsris>{l-HS)r 

The proof is very similar to Proposition 3.1, except, here IsD~^AIs is not (necessarily) a 
positive semidefinite matrix. This is the reason that we prove the inequality only for even time 
steps of the walk. 

Proof. Let P := D'^l'^IsAIsD^^I'^ be a symmetric matrix. By equation (7), 1-0(5) = T:'g{IsD^^AIs 
Using equation (6), the claim's statement is equivalent to the following equation: 

^'P^'^ > iV^'PV^f (18) 

The above inequality can be proved using techniques similar to Lemma 3.2. Let vi, . . . , v„ be the 
eigenvectors of P, corresponding to the eigenvalues Ai, . . . , A„. By equation (11), (18) is equivalent 
to the following equation: 

n / n \2t 

i=i \i=i ) 
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Since ^/^vs is a norm one vector, and /(A) = A^* is a convex function, the above equation holds by 
the Jensen's inequaUty. □ 

□ 

We remark that the above bound only holds for the uniform mixing time, and it can provide 
much stronger lower bound than the bottleneck ratio, if 7 ^ /^(^)- 
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